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Kernelized Low Rank Representation on 
Grassmann Manifolds 

Boyue Wang, Yongli Hu Member, IEEE, Junbin Gao, Yanfeng Sun Member, IEEE, and Baocai Yin 


Abstract —Low rank representation (LRR) has recently attracted great interest due to its pleasing efficacy in exploring low-dimensional 
subspace structures embedded in data. One of its successful applications is subspace clustering which means data are clustered 
according to the subspaces they belong to. In this paper, at a higher level, we intend to cluster subspaces into classes of subspaces. 
This is naturally described as a clustering problem on Grassmann manifold. The novelty of this paper is to generalize LRR on Euclidean 
space onto an LRR model on Grassmann manifold in a uniform kernelized framework. The new methods have many applications in 
computer vision tasks. Several clustering experiments are conducted on handwritten digit images, dynamic textures, human face 
clips and traffic scene sequences. The experimental results show that the proposed methods outperform a number of state-of-the-art 
subspace clustering methods. 

Index Terms —Low Rank Representation, Subspace Clustering, Grassmann Manifold, Kernelized Method 

- > - 


1 Introduction 

In the past years, the subspace clustering or segmen¬ 
tation has attracted great interest in computer vision, 
pattern recognition and signal processing 0-0 The 
basic idea of subspace clustering is based on the fact that 
most data often have intrinsic subspace structures and 
can be regarded as the samples of a mixture of multiple 
subspaces. Thus the main goal of subspace clustering 
is to group data into different clusters, data points in 
each of which justly come from one subspace. To inves¬ 
tigate and represent the underlying subspace structure, 
many subspace methods have been proposed, such as 
the conventional iterative methods the statis¬ 

tical methods l^, |[7| , the factorization-based algebraic 
approaches and the spectral clustering-based 

methods |[^, And they have been successfully 

applied in many scenarios, such as image representation 
[ [TQ] [, motion segement ||^, face classification and 
saliency detection l|^, etc. 

Among all subspace clustering methods aforemen¬ 
tioned, the spectral clustering methods based on affinity 
matrix are considered having good prospects ©, in 
which an affinity matrix is firstly learned from the given 
data and then the final clustering results are obtained 
by spectral clustering algorithms such as K-means or 
Normalized Cuts (NCut) [[T7| . The main component of 
the spectral clustering methods is to construct a proper 
affinity matrix for different data. In the typical method. 
Sparse Subspace Clustering (SSC) ||^, one assumes that 
the data of subspaces are independent and are sparsely 
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represented under the so-called ii Subspace Detection 
Property [p^ , in which the within-class affinities are 
sparse and the between-class affinities are all zeros. 
It has been proved that under certain conditions the 
multiple subspace structures can be exactly recovered 
via £p{p < 1) minimization In most of current sparse 
subspace methods, one mainly focuses on independent 
sparse representation for data objects. 

However, the relation among data objects or the un¬ 
derlying structure of subspaces that generate the sub¬ 
sets of data to be grouped is usually not well con¬ 
sidered, while these intrinsic properties are very im¬ 
portant for clustering applications. So some researchers 
explore these intrinsic properties and relations among 
data objects and then revise the sparse representation 
model to represent these properties by introducing extra 
constraints, such as Label Consistent j^. Sequential 
property 


H[ , Low rank constraint |[^| and its Laplace 
regularization | [^ , etc. In these constraints, the holis¬ 
tic constraints such as the low rank or nuclear norm 
II • II* are proposed in favour of structural sparsity. 
The Low Rank Representation (LRR) model | |^ is one 
of representatives. The LRR model tries to reveal the 
latent sparse property embedded in a data set in high 
dimensional space. It has been proved that, when the 
high-dimensional data set is actually from a union of 
several low dimension subspaces, the LRR model can 
reveal this structure through subspace clustering | |^ . 

Although most current subspace clustering methods 
show good performance in various applications, the 
similarity among data objects is measured in the original 
data domain. For example, the current LRR method is 
based on the principle of data self representation and the 
representation error is measured in terms of Euclidean 
alike distance. However, this hypothesis may not be 
always true for many high-dimensional data in practice 
where data may not reside in a linear space. In fact, it 
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has been proved that many high-dimensional data are 
embedded in low dimensional manifolds. For example, 
the human face images are considered as samples from 
a non-linear submanifold 1^ . It is desired to reveal 
the nonlinear manifold structure underlying these high¬ 
dimensional data. 

There are two types of manifold related learning tasks. 
In the so-called manifold learning, one has to respect the 
local geometry existed in the data but unknown to learn¬ 
ers. The classic representative algorithms for manifold 
learning include LLE (Locally Linear Embedding) 
ISOMAP LLP (Locally Linear Projection) LE 
(Laplacian Embedding) ||^ and LTSA (Local Tangent 
Space Alignment) ||^. In the case of the other type 
of learning tasks, we clearly know manifolds where 
the data come from. For example, in image analysis, 
people usually use covariance matrices of features as a 
region descriptor ||^. In this case, one must respect the 
fact that the descriptor is a point on the manifold of 
symmetrical positive definite matrices. In dealing with 
data from a known manifold, one powerful way is to 
use a non-linear mapping to "flat" the data, like kernel 
methods. In computer vision, it is common to collect 
data on the so-called Grassmann manifold [ |M] |. In these 
cases, the properties of the manifold is known, thus 
how to incorporate the manifold properties for some 
practical tasks is a challenging work. This type of tasks 
incorporating manifold properties in learning is called 
learning on manifolds. 

In this paper, we explore the LRR model to be used for 
clustering a set of data objects on Grassmann manifold. 
The intrinsic characteristics and geometry properties 
of Grassmann manifold will be exploited in algorithm 
design of LRR learning. Grassmann manifold has a nice 
property that it can be embedded into the linear space of 
symmetric matrices. By this way, all the abstract points 
(subspaces) on Grassmann manifold can be embedded 
into a Euclidean space where the classic LRR model can 
be applied. Then an LRR model can be constructed in 
the embedding space, where the error measure is simply 
taken as the Euclidean metric. This idea can also be seen 


in the recent work |32| for computer vision tasks. 

The contributions of this work are listed as follows: 

• Reviewing and extending the LRR model on Grass¬ 
mann Manifold introduced in our conference paper 


• Giving the solutions and practical algorithms to the 
problems of the extended Grassmann LRR model 
under different noise models, particularly defined 
by Erobenius norm and ^ 2/^1 norm; 

• Presenting a new kernelized LRR model on Grass¬ 
mann manifold. 

The rest of the paper is organized as follows. In 
Section we review some related works. In Section 
the proposed LRR on Grassmann Manifold (GLRR) 
is described and the solutions to the GLRR models 
with different noises assumptions are given in detail. In 
Section]^ we introduce a general framework for the LRR 


model on Grassmann manifold from the kernelization 
point of view. In Section the performance of the 
proposed methods is evaluated on clustering problems 
with several public databases. Einally, conclusions and 
suggestions for future work are provided in Section 

2 Related Works 

In this section, we briefly review the existing sparse 
subspace clustering methods including the classic Sparse 
Subspace Clustering (SSC) and the Low Rank Repre¬ 
sentation (LRR) and summarize the properties of Grass¬ 
mann manifold that are related to the work presented in 
this paper. 

2.1 Sparse Subspace Clustering (SSC) 

Given a set of data drawn from a union of unknown 
subspaces, the task of subspace clustering is to find 
the number of subspaces and their dimensions and the 
bases, and then segment the data set according to the 
subspaces. In recent years, sparse representation has 
been applied to subspace clustering, and the proposed 
Sparse Subspace Clustering (SSC) aims to find the spars¬ 
est representation for the data set using £i approximation 
Q. The general SSC can be formulated as the follows: 

min||E;||^ + A||Z||i s.t. Y = DZ ^ E,dm^{Z) = 0, (1) 

E ,Z 

where Y G is a set of signals in dimension d and 

Z is the correspondent sparse representation of Y under 
the dictionary D, and E represents the observation noise 
or the error between the signals and its reconstructed 
values, which is measured by norm | • |^, particularly in 
terms of Euclidean norm, i.e., ^ = 2 (or ^ = F) denoting 
the Erobenius norm to deal with the Gaussian noise, or 
£ = 1 (Laplacian noise) to deal with the random gross 
corruptions or = -^ 2/^1 to deal with the sample-specific 
corruptions. Einally A > 0 is a penalty parameter to 
balance the sparse term and the reconstruction error. 

In the above sparse model, it is critical to use an 
appropriate dictionary D to represent signals. Generally, 
a dictionary can be learned from some training data by 
using one of many dictionary learning methods, such as 
the K-SVD method p4) . However, a dictionary learning 
procedure is usually time-consuming and so should be 
done in an offline manner. So many researchers adopt 
a simple and direct way to use the original signals 
themselves as the dictionary, which is known as the self¬ 
expressiveness property ||^ to find subspaces, i.e. each 
data point in a union of subspaces can be efficiently 
reconstructed by a linear combination of other points in 
dataset. More specifically, every point in the dataset can 
be represented as a sparse linear combinations of other 
points from the same subspace. Mathematically we write 
this sparse formulation as 

min||F||^ + A||Z||i s.t. F = FZ + F,diag(Z) = 0. (2) 

E ,Z 
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From these sparse representations an affinity matrix Z is 
compiled. This affinity matrix is interpreted as a graph 
upon which a clustering algorithm such as Normalized 
Cuts (NCut) | [T7| is applied for final segmentation. This 
is the typical approach of modern subspace clustering 
techniques. 


2.2 Low-Rank Representation (LRR) 

The LRR can be regarded as one special type of sparse 
representation, in which rather than compute the spars¬ 
est representation of each data point individually, the 
global structure of the data is incorporeally computed 
by the lowest rank representation of a set of data points. 
The low rank measurement has long been utilized in 
matrix completion from corrupted or missing data p^ , 
| [3^ . Specifically for clustering applications, it has been 
proved that, when a high-dimensional data set is actually 
composed of data from a union of several low dimension 
subspaces, LRR model can reveal this structure through 
subspace clustering ||^. It is also proved that LRR 
has good clustering performance in dealing with the 
challenges in subspace clustering, such as the unclean 
data corrupted by noise or outliers, no prior knowledge 
of the subspace parameters, and lacking of theoretical 


guarantees for the optimality of the method |[^|, | |T^ |, 
||^. The general LRR model can be formulated as the 
following optimization problem: 


min||i;||,^ + A||Z||, 

h/ , Z/ 


s.t. Y = YZ^E, 


( 3 ) 


where Z is the low rank representationa of the data set 
Y by itself. Here the low rank constraint is achieved by 
approximating rank with the nuclear norm || • ||* , which 
is defined as the sum of singular values of a matrix and 
is the low envelop of the rank function of matrices | [3^ . 

Although the current LRR method has good perfor¬ 
mance in subspace clustering, it relies on Euclidean 
distance for measuring the similarity of the raw data. 
However, this measurement is not suitable to high¬ 
dimensional data with embedding low manifold struc¬ 
ture. To characterize the local geometry of data on an 
unknown manifold, the LapLRR method |j^ uses the 
graph Laplacian matrix derived from the data objects 
as a regularized term for the LRR model to represent 
the nonlinear structure of high dimensional data, while 
the reconstruction error of the revised model is still 
computed in Euclidean space. 


2.3 Grassmann Manifold 

This paper is concerned with the points particularly on a 
known manifold. Generally manifolds can be considered 
as low dimensional smooth "surfaces" embedded in a 
higher dimensional Euclidean space. At each point of 
the manifold, manifold is locally similar to Euclidean 
space. In recent years, Grassmann manifold has attracted 
great interest in the computer vision research commu¬ 
nity. Although Grassmann manifold itself is an abstract 


manifold, it can be well represented as a matrix quotient 
manifold and its Riemannian geometry has been inves¬ 
tigated for algorithmic computation |39). 

Grassmann manifold has a nice property that it can 
be embedded into the space of symmetric matrices via 
the projection embedding, referring to Section [3^ below. 
This property was used in subspace analysis, learning 
and representation ||^-||^. The sparse coding and dic¬ 
tionary learning within the space of symmetric positive 
definite matrices have been investigated by using kernel- 
ing method | [4^ . For clustering applications, the mean 
shift method was discussed on Stiefel and Grassmann 
manifolds in | |44| . Recently, a new version of K-means 
method was proposed to cluster Grassmann points, 
which is constructed by a statistical modeling method 
| [4^ . These works try to expand the clustering methods 
within Euclidean space to more practical situations on 
nonlinear spaces. Along with this direction, we further 
explore the subspace clustering problems on Grassmann 
manifold and try to establish a novel and feasible LRR 
model on Grassmann manifold. 


3 LRR ON Grassmann Manifolds 
3.1 LRR on Grassmann Manifolds 

In most of cases, the reconstruction error of LRR model 
in © is computed in the original data domain. For 
example, the common form of the reconstruction error 
is Frobenius norm in original data space, i.e. the error 
term can be chosen as \\Y—YZ\\p. In practice, many high 
dimension data have their intrinsic manifold structures. 
For example, it has been proved that human faces in 
images have an underlying manifold structure | |4^ . In an 
ideal scenario, the error should be measured according to 
the manifold geometry. So we consider signal representa¬ 
tion for the data with manifold structure and employ an 
error measurement in LRR model based on the distance 
defined on manifold spaces. 

However the linear relation defined by F = YZyE is 
no longer valid on a manifold. One way to get around 
this difficulty is to use the log map on a manifold to lift 
points (data) on a manifold onto the tangent space at a 
data point. This idea has been applied for clustering and 
dimensionality reduction on manifold in | |47| . 

However when the underlying manifold is Grassman- 
nian, we can use the distance over its embedded space 
to replace the manifold distance and the linear relation 
can be implemented in its embedding Euclidean space 
naturally, as detailed below. 

Grassmann manifold G{p,d) ||^ is the space of all 
p-dimensional linear subspaces of for 0 < p < d. 
A point on Grassmann manifold is a p-dimensional 
subspace of which can be represented by any of 
orthonormal basis X = [xi, X 2 ,..., x^] G The 

chosen orthonormal basis is called a representative of 
a subspace S = span(A). Grassmann manifold G{p-,d) 
has one-to-one correspondence to a quotient manifold 
of see On the other hand, we can embed 
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x={x^xr}l, 


Fig. 1. The GLRR Model. The mapping of the points 
on Grassmann manifold, the tensor A' with each slice 
being a symmetric matrix can be represented by the linear 
combination of itself. The element Zij of Z represents the 
similarity between slice i and j. 


Grassmann manifold G{p,d) into the space oi d x d 
symmetric matrices Sym(d) by the following mapping, 
see 

n : g{p, d) Sym(d), n(x) = XX^. (4) 

The embedding 11 (X) is diffeomorphism (a one- 
to-one, continuous, differentiable mapping with a con¬ 
tinuous, differentiable inverse). Then it is reasonable to 
replace the distance on Grassmann manifold by the fol¬ 
lowing distance defined on the symmetric matrix space 
under this mapping, 

(5(Xi,X 2) = ||n(Xi) - n(X 2 )||F = \\x^xj - x^x^Wp. 

( 5 ) 

3.1.1 LRR on Grassmann Manifold with Gaussian Noise 
(GLRR-F) ^ 

Given a set of data points {Xi,X 2 ,Xjv} on Grass¬ 
mann manifold, i.e., a set of subspaces {5i, ^ 2 ,^Sat} 
of dimension p accordingly, we have their mapped sym¬ 
metric matrices {XiXf, X 2 X 2 ^,..., X^rX^} c Sym(d). 
Similar to the LRR model in (|^, we represent these sym¬ 
metric matrices by itself and use the error measurement 
defined in ^ to construct the LRR model on Grassmann 
manifold as follows: 

min||f|||. + A||Z||* s.t. X = XxsZ + £, (6) 

s 

where X is a 3-order tensor by stacking all mapped 
symmetric matrices X = {XiXf, X 2 X 2 , X^Xj^} 
along the 3rd mode, £ is the error tensor and X 3 means 
the mode-3 multiplication of a tensor and a matrix, see 
| [5Q| . The representation of X and the 3-order product 
operation are illustrated in Fig. 

The use of the Frobenius norm in § makes an as¬ 
sumption that the model fits to Gaussian noise. We 
call this model the Frobenius norm constrained GLRR 
(GLRR-F). In this case, we have 

N 

\\£fp = J2\\E{:,:,i)fp, (7) 


N 

where E= XiXf - Zij{XjXj) is the i-th slice of 

i=i 

f, which represents the distance between the symmetric 

N 

matrix XiXj and its reconstruction Y 

i=i 


3.1.2 LRR on Grassmann Manifold with Noise 
(GLRR-21) 

When there exist outliers in the data set, the Gaussian 
noise model is no longer a favoured choice. Instead 
we propose using the so-called || • noise model. 

For example, in LRR clustering applications j^, [ |T4| , 
II • ||^ 2 /^i is used to cope with columnwise gross errors in 
signals. In a similar fashion, we formulate the following 
norm constrained GLRR model (GLRR-21), 

mm 11^||^2/-^i d" '^ll'^ll* s.t. X = X X3 Z -\- £., (8) 


where the ||f||^ 2 /^i uorm of a tensor is defined as the 
sum of the Frobenius norm of the 3-mode slices as the 
following form: 


N 

Note that © without squares is different from ( 0 . 


3.2 Algorithms for LRR on Grassmann Manifold 

The GLRR models in and ^ present two typical 
optimization problems. In this subsection, we propose 
appropriate algorithms to solve them. 

The GLLR-F model was proposed in our earlier ACCV 
paper where an algorithm based on ADMM was 
proposed. In this paper, we provide an even fast closed 
form solution for <| 6 } and further investigate the structure 
of tensor used in these models for a practical solution for 

Intuitively, the tensor calculation can be converted 
to matrix operation by tensorial matricization, see | [5Q[ . 
For example, we can matricize the tensor X G 
in mode-3 and obtain a matrix X( 3 ) G ^ 

data points (in rows). So it seems that the problem 
has been solved using the method of the standard LRR 
model. However, as the dimension d * d is often too 
large in practical problems, the existing LRR algorithm 
could break down. To avoid this scenario, we carefully 
analyze the representation of the construction tensor 
error terms and convert the optimization problems to 
its equivalent and readily solvable optimization model. 
In the following two subsections, we will give the detail 
of these solutions. 


3.2.1 Algorithm for the Frobenius Norm Constrained 
GLRR Model 

We follow the notation used in p3| . By using variable 
elimination, we can convert problem § into the follow¬ 
ing problem 

mm\\X-Xx3Z\\j, + X\\Z\U. (10) 
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We note that {XjXi) has a small dimension pxp which 
is easy to handle. To simplify expression of the objective 
function ([^, we denote 

\j=tr[iXjXi)iXfXj)]. ( 11 ) 

Clearly = Aji. Define an N x N symmetric matrix 

A = [Ai,-]. .. (12) 

Then we have the following Lemma. 


3.2.2 Algorithm for the hlh Norm Constrained GLRR 
Model 

Now we turn to the GLRR-12 problem Because the 
existence of ^ 2/^1 norm in error measure, the objective 
function is not differentiable but convex. We propose 
using the alternating direction method (ADM) method 
to solve this problem. 

Firstly, we construct the following augmented La- 
grangian function: 


Lemma 1. Given a set of matrices {Xi,X 2 ,..., 
Xn] s.t. Xi e and XfXi = R if A = [Aifij e 

j^NxN element Aij = tr [{XjXi){X'f Xj)], then the 
matrix A is semi-positive definite. 

Proof: Denote by Bi = XiXf. Then Bi is a symmetric 
matrix of size d x d. Then 


Aij = tr [{XjXiXXfXj)] = tr [{XjXj){XiXT)] 

= vec{Bi)’^vec{Bj), 

where vec(') is the vectorization of a matrix. 

Define a matrix B = [vec{Bi),vec{B 2 ), ...,vec{BN)]. 
Then it is easy to show that 

A = [Aij]ij = [veciBifvec{Bj)]fj^^ = B^B. 

So A is a semi-positive definite matrix. □ 

Based on the conclusion from Lemma 1, we have the 
eigenvector decomposition for A defined by 

A = UDU'^, 

where U'^U = I and D = diag((Ti) with nonnegative 
eigenvalues Define the square root of A by 

A^ = 

then it is not hard to prove that problem |Tq] ) is equiva¬ 
lent to the following problem 

minWZA^ - Ai\\% + X\\Z\\,. (13) 

z 

Finally we have 

Theorem 2. Given that A = UDU^ as defined above, the 
solution to is given by 

Z* = UDxU^, 


where Dx is a diagonal matrix with its i-th element defined 
by 




1-^7 if (Ti> \ 
0 otherwise. 


Proof: Please refer to the proof of Lemma 1 in [ [T5]| . 

□ 

According to Theorem 2, the main cost for solving 
the LRR on Grassmann manifold problem m is (i) 
calculation the symmetric matrix A and (ii) a SVD for 
A. This is a significant improvement to the algorithm 
presented in [ |^ . 


, Z,0 =||f + A||Z|U ^{RX-XxsZ-S) 

+ - X Xs Z - £\\%, (14) 

where (•, •) is the standard inner product of two tensors 
in the same order, ^ is the Lagrange multiplier, and p is 
the penalty parameter. 

Then ADM is used to decompose the minimization 
of L w.r.t E and Z simultaneously into two subproblems 
w.r.t £ and Z, respectively. More specifically, the iteration 
of ADM goes as follows: 

ffe+i =s,igmmL{£,Zf^'^) 
s 

= argmin||£:||f,,/,^ + X 3 Z'^ - £) 

+ !f\\X-Xx 3 Z’^-£\\%, (15) 

=argminL(£’''+\Z,C'') 

z 

= argmin A||Z||* + (e^ A* - A" X3 ^ - £'^+^) 
z 

+ !f\\X-XxsZ-£'^+f\%, (16) 

^''+1 = /[A*-Af X 3 -£:*+!], (17) 


where we have used an adaptive parameter The 
adaptive rule will be specified later in Algorithm 1. 

The above ADM is appealing only if we can find 
closed form solutions for the subproblems and ( p^ . 

First we consider problem ( p3] ). Denote C^ = X — X X 3 
Z^ and for any 3-order tensor A we use A(i) to denote 
the Ath slice A(:,:,i) along the 3-mode as a shorten 
notation. Then we observe that is separable in terms 
of matrix variable E{i) as follows: 


= argmin \\E{i)\\F + (C^i), C'=(i) - E{i)) 

E(i) 

+ ^\\C\i)-E{i)\\% 

= argmin ||i;(*)||^ + ^||C''(z) - ^(*) + 

E(i) 2 /i^ 

(18) 

From Lemma 3.2 in | |^ , we know that the problem 
in ( p^ has a closed form solution, given by 




0 


where M = ||C''(i) + 


if M < 

otherwise. 

(19) 
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Denoting by 

f{Z) = {e,X-XxsZ-£'^+^) + f^\\X-Xx3Z-£'^+^\\l, 


Finally we can use the following linearized proximity 
approximation to replace pO] ) as follows 

^/c+l 


problem ( p^ becomes 

Z^+^ = argmin A||Z||* + f{Z). (20) 

z 

We adopt the linearization method to solve the above 
problem. For this purpose, we need to compute df{Z) 
w.r.t. Z. To do so, we firstly utilize the matrices in each 
slice to compute the tensor operation in the definition of 
f{Z). For the i-th slice of the first term in /(Z), we have 

i=i 


= argmin A||Z||* + {df{Z'^), Z - Z'^) + - Z'^\\% 

z ^ 

2 


k 

= argininA||Z||* + 
z z 


Z-Z^ + 


df{z^) 




(23) 


with a constant 77 > ||T'|p where ||T’|p is the matrix norm 
of the third mode matricization of the tensor T'. The new 
problem ( |23| ) has a closed form solution given by, see 

pg. 


A (S,)Fj, 


(24) 


N 

= + ir{e{^^{XiXf - E>^+\i))). 

i=i 

Define a new matrix by 


where UzZ^zVj is the SVD of Zk — ^ and 5r(-) is 

the Singular Value Thresholding (SVT) operator defined 
by 

5.(I])=diag(sgn(S,,)(|Sii|-T)). 


= [\x{e{ifX^Xj)\. ., 

then the first term m f{Z) has the following representa¬ 
tion: 

T' - df X 3 Z - = -tr(^^Z^) + const. ( 21 ) 


Finally the procedure of solving the ^ 2/^1 norm con¬ 
strained GLRR problem ^ is summarized in Algorithm 
1. For the purpose of the self-completion of the paper, we 
borrow the convergence analysis for Algorithm 1 from 
without proof. 


For the Ath slice of the second term of f{Z), we have 

N 

\\XiXf 

i=i 

=\x{{XiXjY' XiXj) + \x{E'^+^{iY' E'^+^{i)) 

N N 

+ E E %.%.tr((x,,x7;)^(x,,x^)) 

ii=ii2=i 

-2\x{{XiXjY'E'^+^(i)) 

N 

-2Y,ZiM{XiXjf{XiXj - E'^+^i))). 
i=i 

Denoting a matrix by 

^ [tr(i;'=+i(*)^X,Xj)]. . 
and noting we will have 

WX-Xx^Z-E’^+^Wl 

=tr(ZAZ^) - 2 tr((A - ^'^)Z) + const. ^ ’ 

Combining l |^ and 1^ , we have 

/c 1 

f{Z) = ^triZAZ'^) - /tr((A - 4''' + E$'=)Z) + const. 

2 /i^ 

Thus we have 


Theorem 3. If is non-decreasing and upper bounded, 
p > ||A’|p, then the sequence generated by 

Algorithm 1 converges to a KKT point of problem 

4 Kernelized LRR on Grassmann Mani¬ 
fold 

4.1 Kernels on Grassmann Manifold 

In this section, we consider the kernelization of the 
GLRR-F model. In fact, the LRR model on Grassman 
manifold § can be regarded a kernelized LRR with 
a kernel feature mapping 11 defined by (|^. It is not 
surprised that A is semi-definite positive as it serves 
as a kernel matrix. It is natural to further generalize 
the GLRR-F based on kernel functions on Grassmann 
manifold. 

There are a number of kernel functions proposed in 
recent years in computer vision and machine learning 
communities, see j^, ||^, ||^, ||^. For simplicity, we 
focus on the following kernels: 

1. The Projection Kernel: This kernel is defined in (ig . 
For any two Grassmann points Xi and Xj, the kernel 
value is 

fcP™i(Ai, A,) = WXfXjfp = tv({XiXff(XjX[)). 


df(Z) = ii’^ZA - 



^_ 


T 


The feature mapping of the kernel is actually the map¬ 
ping defined in 
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Algorithm 1 Low-Rank Representation on Grassmann 
Manifold._ 

Input: The Grassmann sample set {Xi}fLi,Xi e Q{p^d), 
the cluster number k and the balancing parameter A. 
Output: The Low-Rank Representation Z 
1: InitializeiZ^ = 0, f^ = 0, = 1.9, p > ||T'|p, 

/ = 0.01, /imax = 10^^ ei = 10-^ and ^2 = 10“^. 

2: Prepare A according to 0; 

3 : Computing L by Cholesky Decomposition A = LL^; 


4 : while not converged do 
5 : Update f according to ( p^ ; 

6: Update according to 

7 : Update according to {17\ ; 

8: Update according to the following rule: 

fj^k+i ^ p, iimax} 


where 


P 


k 


>0 if AiVll<^l|max{^||Z'=+i-Z''||^, 
< - S'^Wf} < S2 

1 otherwise 


9 : Check the convergence conditions: 

\\X-X X 3 Z'^+^-S^+^\\/\\X\\<ei 

and 

p/\\x\\ max{^\\Z’‘+^-Z’‘\\F, \\S'^+^-£’‘\\f} < £2 

10 : end while 


2. Canonical Correlation Kernel: Referring to |j^, this 
kernel is based on the cosine values of the so-called prin¬ 
cipal angle between two subspaces defined as follows 

cos(^^) = max max u^v^, 

UmCspan(V^) v,nCspan(Xj) 

such that ||u,„||2 = ||vm||2 = 1 ; 

“mUfe = 0, fc = 1,2, ...,m - 1; 

VmVi =0, ; = l,2,...,m- 1. 

We can use the largest canonical correlation value (the 
cosine of the first principal angle) as the kernel value as 
done in ||^|, i.e., 

xTx 

kP^{Xi^Xj)= max max ^ . 

XiGspan(Xi) XjGspan(Xj) ||x^ ||2 ||Xj ||2 

The cosine of principal angles of two subspaces can 
be calculated by using SVD as discussed in | [56) , see 
Theorem 2.1 there. 

Consider two subspaces span(Xi) and span(Xj) as 
two Grassmann points where Xi and Xj are given bases. 
If we take the following SVD 

xfx, = 

then the values on the diagonal matrix S are the cosine 
values of all the principal angles. The kernel k^^{Xi,Xj) 


uses partial information regarding the two subspaces. 
To increase its performance in our LRR, in this paper, 
we use the sum of all the diagonal values of S as the 
kernel value between Xi and Xj . We still call this revised 
version the canonical correlation kernel. 

4.2 Kernelized LRR on Grassmann Manifold 

Let k be any kernel function on Grassmann manifold. 
According to the kernel theory j^, there exists a feature 
mapping cj) such that 

(j) : g{p, n) 7, 

where T is the relevant feature space under the given 
kernel k. 

Give a set of points {Ai, X 2 ,..., Aat} on Grassmann 
manifold Q(j)^n), we define the following LRR model 

mmU{X)-4>{X)Z\\%^\\\Z\\,. (25) 

We call the above model the Kernelized LRR on Grass- 
man manifold, denoted by KGLRR, and KGLRR-cc, 
KGLRR-proj for k = k^^ and k = k^^^^ respectively. How¬ 
ever, for KGLRR-proj, the above mod el ([^ becomes the 
LRR model on Grassmann manifold {10). 

Denote by K the NxN kernel matrix over all the data 
points A's. By using the similar derivation in |^, we 
can prove that the model ( |25| ) is equivalent to 

mm-2tr(AZ) +tr(ZiTZ^) + A||Z||*, 

which is equivalent to 

min||ZA^ -K^\\l^X\\Z\\,. (26) 

z 

where AT ^ is the square root matrix of the kernel matrix 
K. So the Kernelized model KGLRR-proj is similar to 
GLRR-F model in Section |3l 

It has been proved that using multiple kernel functions 
may obtain improving performance in many application 
scenarios 1^ , due to the virtues of different kernel func¬ 
tions for the complex data. So in practice, we can employ 
different kernel functions to implement the model in 
^5) , even we can adopt a combined kernel function. For 
example, in our experiments, we use a combination of 
the above two kernel functions k^^ and k^^^^ as follows. 

A:Cc-proj(x^^ A^-) = ak^^Xi.Xj) + (1 - a)k^^^\Xi,Xj). 

where a is the hand assigned combination coefficient. 
We denote the Kernelized LRR model oi k = by 

KGLRR-cc+proj. 

4.3 Algorithm for KGLRR 

It is straightforward to use Theorem 2 to solve ( |26| . For 
the sake of convenience, we present the algorithm below. 

Let us take the eigenvector decomposition of the ker¬ 
nel matrix K 

K = UDU^, 
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where D = daig(cri, (72 ,(Jat) is the diagonal matrix of 
all the eigenvalues. Then the solution to ( |^ is given by 

Z* = UDxU^, 


where 

by 


Dx is the diagonal matrix with elements defined 





if cFi > A; 
otherwise. 


This algorithm is valid for any kernel functions on 
Grassmann manifold. 


5 Experiments 

To investigate the performance of our proposed 
methods, GLRR-21, GLRR-F/KGLRR-proj, KGLRR-cc, 
KGLRR-cc+proj, we conduct clustering experiments on 
several widely used public databases, the MNIST hand¬ 
written digits database j^, the DynTex++ database 
[ [6Q] [, the Highway Traffic Dataset | [^ and the YouTube 
Celebrity (YTC) dataset [6^ . The clustering results 
are compared with three state-of-the-art clustering algo¬ 
rithms, SSC, LRR and the Statistical Computations on 
Grassmann and Stiefel Manifold (SCGSM) in All 
the algorithms are coded in Matlab 2014a and imple¬ 
mented on an Intel Core i7-4770K 3.5GHz CPU machine 
with 16G RAM. In the following, we first describe each 
dataset and experiment setting, then report and analyze 
our experiment results. 


5.1 Datasets and Experiment Setting 

5.1.1 Datasets 

Four widely used public datasets are used to test the 
chosen algorithms. They are 

1) MNIST handwritten digit database 

The database consists of approximately 70,000 digit 
images written by 250 volunteers. For recognition ap¬ 
plications, 60,000 images are generally used as training 
sets and the other 10,000 images are used as testing 
sets. All the digit images have been size-normalized and 
centered in a fixed size of 28 x 28. Some samples of this 
database are shown in Fig. As the samples in this 
database are sufficient and the images are almost noise- 
free, we choose this database to test the performance of 
our clustering methods in an ideal condition and in noisy 
condition at different levels in order to get some insight 
of the new methods. 

2) DynTex++ database 

The database is derived from a total of 345 video 
sequences in different scenarios, which contains river 
water, fish swimming, smoke, cloud and so on. Some 
frames of the videos are shown in Fig. The videos are 
labeled as 36 classes and each class has 100 subsequences 
(totally 3600 subsequences) with a fixed size of 50 x 50 x 
50 (50 gray frames). This is a challenging database for 
clustering because most textures from different classes 


0 / 

S 7 ^ 9 


Fig. 2. The MNIST digit samples for experiments. 



Fig. 3. DynTex++ samples. Each row is from the same 
video sequence. 


are fairly similar and the number of classes is quite large. 
We select this database to test the clustering performance 
of the proposed methods for the case of large number of 
classes. 

3) YouTube Celebrity dataset (YTC) ||^, | |63) 

The dataset is downloaded from Youtube. It contains 
videos of celebrities joining activities under real-life sce¬ 
narios in various environments, such as news interviews, 
concerts, films and so on. The dataset is comprised 
of 1,910 video clips of 47 subjects and each clip has 
more than 100 frames. We test the proposed methods 
on a face dataset detected from the vidoe clips. It is 
a quite challenging dataset since the faces are all of 
low resolution with variations of expression, pose and 
background. Some samples of YTC dataset are shown in 

Fig-S 

4) Highway traffic dataset m 

The dataset contains 253 video sequences of highway 
with three traffic levels, light, medium and heavy, in 
various weather scenes such as sunny, cloudy and rainy. 



Fig. 4. YouTube Celebrity samples. Each row includes 
frames from different video sequences of the same per¬ 
son. 
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Fig. 5. Highway Traffic scene samples. Sequences at 
three different levels: First row is at light level, the second 
row at medium level and the last row at heavy level. 

Each video sequence has 42 to 52 frames. Fig. shows 
some frames of traffic scene of three levels. The video 
sequences are converted to grey images and each image 
is normalized to size 48 x 48 with mean zero and unit 
variance. This database has much challenge as the scenes 
and its weather context are changing timely. So it is a 
good dataset for evaluating the clustering methods in 
real world scene. 

5.1.2 Experiment Setting 

GLRR model is designed to cluster Grassmann points, 
which are subspaces instead of raw object/signal points. 
Thus before implementing the main components of 
GLRR and the spectral clustering algorithm (here we 
use Ncut algorithm), we must represent the raw sig¬ 
nals in a subspace form, i.e., the points on Grassmann 
manifold. As a subspace can be generally represented 
by an orthonormal basis, we utilize the samples drawn 
from the same subspace to construct its basis repre¬ 
sentation. Similar to the previous work [ [42| [ [64] [, we 
simply adopt Singular Value Decomposition (SVD) to 
construct a subspace basis. Concretely, given a set of 
images, e.g., the same digits written by the same per¬ 
son, denoted by {Yi}fL^ and each Yi is a grey-scale 
image with dimension mxn, we can construct a matrix 
r = [vec(yi), vec(l 2 ), •••, vec(yp)] of size {m ^ n) x P 
by vectorizing each image Y^. Then L is decomposed 
by SVD as r = UYV. We can pick the first p singular- 
vectors of U to represent the image set as a point X on 
Grassmann manifold Q{p,m ^ n). 

The setting of the model parameters affects the perfor¬ 
mance of our proposed methods. A is the most important 
penalty parameter for balancing the error term and the 
low-rank term in our proposed methods. Empirically, the 
value of A in different applications has big gaps, and the 
best value for A has to be chosen from a large range 
of values to get a better performance in a particular 
application. From our experiments, we have observed 
that, for a fixed database, when the cluster number is 
increasing, the best A is decreasing, and that A will be 
smaller when the noise of data is lower while A larger if 
the noise level higher. This observation can be used as 


a guidance for future applications of the methods. On 
the other hand, the error tolerances 5 are also important 
in controlling the terminal condition, which bound the 
allowed reconstructed errors. We experimentally seek a 
proper value of 5 to make algorithms terminate at an 
appropriate stage with better errors. 

For the conventional SSC and LRR methods, Grass¬ 
mann points cannot be used as inputs. In fact our 
experiments confirm this naive strategy results in poorer 
performance for both SSC and LRR. To construct a fair 
comparison between SSC or LRR and our Grassmann 
based algorithms, we adopt the following strategy to 
construct training data for SSC and LRR. For each image 
set, we "vectorize" them into a long vector with all the 
raw data in the image set, in a carefully chosen order, 
e.g., in the frame order etc. In most of the experiments, 
we cannot simply take these vectors as inputs to SSC 
and LRR algorithms because of high dimensionality for 
a larger image sets. In this case, we apply PGA to reduce 
the raw vectors to a low dimension which equals to 
either the dimension of subspaces of Grassmann man¬ 
ifold or the number of PGA components retaining 90% 
of its variance energy. Then PGA projected vectors will 
be taken as the inputs to SSC and LRR algorithms. 

5.2 MNIST Handwritten Digit Ciustering 

In this experiment, we simply test our algorithms on 
the test dataset of MNIST. We divide 10,000 images into 
N = 495 subgroups so that each subgroup consists of 20 
images of a particular digit to simulate the images from 
the same person. Thus our task is to cluster N = 495 
image subgroups into 10 categories. As described in the 
last section, we use p = 20 leading singular vectors to 
represent each subgroup as a Grassmann point X. Thus 
the size of the representative matrix of a Grassmann 
point is (28 * 28) x 20. 

For SSC and LRR, the size of the input vector becomes 
28 * 28 * 20 = 15680, which is too large to handle on a 
desktop machine. We use PGA to reduce each vector to 
315 by keeping 90% variance energy. And this dimension 
will increase when the noise level increases. 

After getting the low-rank representation of Grass¬ 
mann points mentioned above, we pipeline the coeffi¬ 
cient matrix abs{Z)Yabs{Z^) to NCut for clustering. The 
experiment results are reported in Table It is shown 
that the accuracy of our proposed algorithms, GLRR- 
21 , GLRR-F/ KGLRR-proj, KGLRR-cc, KGLRR-cc+proj, 
are all 100%, outperforming other methods more than 
10 percents. The manifold mapping extracts more useful 
information about the differences among sample data. 
Thus the combination of Grassmann geometry and LRR 
model brings better accuracy for NCut clustering. 

To test the robustness of the proposed algorithms, we 
add Gaussian noise A(0,cr^) onto all the digit images 
and then cluster them by different algorithms mentioned 
above. Fig. shows some digit images with noise a = 
0.3. Generally, the noises will effect the performance 














IEEE MANUSCRIPT, JANUARY 2015 


10 



SSC 0 

LRR 

SCGSM 

GLRR-21 

GLRR-F (33| 
/KGLRR-^j 

KGLRR-cc 

KGLRR-cc+proj 

Accuracy 

0.7576 

0.8667 

0.8646 

1 

1 

1 

1 


TABLE 1 

Subspace clustering results on the MINST database. 



of the clustering algorithms, especially when the noise 
is heavy. Table shows the clustering performance of 
different methods with the noise standard deviation a 
ranging from 0.05 to 0.35. It indicates that our algorithm 
keeps 100% accuracy for the standard deviation up to 
0.3, while the accuracy of other methods is generally 
lower than our method and behaves unstable when 
the noise standard deviation varies. This indicates that 
our proposed algorithms are robust for certain level of 
noises. 

We further study the impact of A on the performance 
of the clustering methods by varying A value. From these 
experiments, it is observed that A depends on noise 
levels. Generally, a relatively larger A will give better 
clustering results when the noise level is higher. This 
explains that the noise level will impact the rank of 
the low-rank representation Z. A larger noise level will 
increase the rank of the represented coefficient matrix. 
So A should be increased if we have a prior knowledge 
of higher level of noises. 

5.3 Dynamic Texture Clustering 

For the texture video sequences, the dynamic texture 
descriptor. Local Binary Patterns from Three Orthogonal 
Plans (LBP-TOP) |[^|, is considered more suitable to 
capture its spacial and temporary features. So we use 
LBP-TOP to construct the dynamic texture points on 
Grassmann manifold instead of the former SVD method. 
Generally, the LBP-TOP method extracts the local co¬ 
occurrence features of a dynamic texture from three 
orthogonal planes of the sequential space. For 3600 
subsequences in the DynTex++ database, the LBP-TOP 
features are extracted to obtain 3600 matrices each in 
size of 177 x 14. We directly use these feature matrices 
as the points on Grassmann manifold. As the class 
number of all the 3600 subsequences is large, we pick 
the first C(= 3,..., 10) classes from 36 classes and 50 
subsequences for each class to cluster. The experiments 


are repeated several times for each C. For SSC and LRR, 
the size of the input vector is 50*50*50, which is even too 
large for PGA algorithm. So we employ 2D PGA | [^ to 
reduce the dimension to the subspace dimension of the 
Grassmann manifold. 

The clustering results for DynTex++ database are 
shown in Table For more than 4 classes, the accuracy of 
the proposed methods are superior to the other methods 
around 10 percents. The accuracy of KGLRR-cc+proj is 
higher than GLRR-21 and GLRR-F except for the case 
of 9 classes, which means the kernel version is more 
stable. We also observe that the accuracy decreases as the 
number of classes increases. This may be caused by the 
clustering challenge when more similar texture images 
are added into the data set. 


5.4 YouTube Celebrity Clustering 

In order to create a face dataset from the YTC videos, 
a face detection algorithm is exploited to extract face 
regions and resize each face to a 20 x 20 image. We 
treat the faces extracted from each video as an image set, 
which is represented as a point on Grassmann manifold 
by the SVD method as used for the handwritten digit 
case. Each face image set contains varying number of 
face images, however we fix the dimension of subspaces 
to p = 20. Since there is a big gap between 13 and 349 
frames in the YTC videos and PGA algorithm requires 
each sample has the same dimension, it is unfair to select 
only few frames equally from each video as the input 
data for SSC and LRR algorithms. Hence we give up 
comparing our methods with SSC and LRR. 

We simply choose C(= 4,..., 10) persons, respectively, 
as the target classes from totally 47 persons and test 
the proposed algorithms over all the face image sets 
of the chosen persons. Table shows the clustering 
results on YTC face dataset with different number of se¬ 
lected persons. The accuracy of our methods, especially 
the kernel methods, are significantly higher than other 
methods. Like the Dyntex texture experiment, with the 
number of persons (classes) increasing, the accuracy for 
most algorithms decreases and KGLRR-cc+proj behaves 
more stably. Because GLRR-21 consumes so much CPU 
memory resource that we could not test a wide range 
of A to get a better experiment result, actually we have 
to relax the terminal condition and empirically select 
some A. The accuracy of GLRR-21 reported is not the 
best result. All the other methods are tested on a wide 
range of A from 0.1 to 50. 
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Noise 

SSC 0 

LRR 

SCGSM 

GLRR-21 

GLRR-F (3^ 
/KGLRR-^j 

KGLRR-cc 

KGLRR-cc+proj 

0.05 

0.7838 

0.8667 

0.8646 

1 

1 

1 

1 

0.1 

0.7596 

0.8889 

0.7091 

1 

1 

1 

1 

0.15 

0.7475 

0.9939 

0.8667 

1 

1 

1 

1 

0.2 

0.6202 

0.9960 

0.7374 

1 

1 

1 

1 

0.25 

0.3374 

0.9960 

0.7293 

1 

1 

1 

1 

0.3 

0.2020 

0.8909 

0.6828 

1 

1 

1 

1 

0.35 

0.1556 

0.8263 

0.2646 

0.8889 

0.996 

0.8566 

0.9838 


TABLE 2 

Subspace clustering results on the MINST database. 


Class 

SSC 

LRR 

SCGSM 

GLRR-21 

GLRR-F ^ 
/KGLRR-^j 

KGLRR-cc 

KGLRR-cc+proj 

3 

0.6700 

0.9967 

1 

1 

1 

1 

1 

4 

0.7075 

0.8625 

0.9050 

0.9975 

0.9975 

0.9975 

0.9975 

5 

0.5060 

0.7280 

0.8340 

0.9840 

0.9740 

0.9980 

0.9880 

6 

0.4167 

0.5933 

0.7367 

0.9250 

0.8683 

0.9150 

0.9300 

7 

0.3371 

0.5643 

0.5914 

0.9071 

0.8857 

0.8757 

0.9100 

8 

0.4187 

0.4788 

0.6313 

0.8725 

0.8513 

0.8675 

0.8738 

9 

0.3556 

0.4378 

0.5044 

0.7689 

0.8056 

0.7867 

0.7800 

10 

0.2550 

0.4440 

0.4790 

0.6940 

0.7620 

0.7150 

0.8110 


TABLE 3 

Subspace clustering results on the DynTex database. 


Class 

SCGSM 

GLRR-21 

GLRR-F 

/KGLRR-^j 

KGLRR-cc 

KGLRR-cc+proj 

4 

0.5282 

0.6972 

0.8944 

0.8944 

0.9085 

5 

0.7188 

0.9167 

0.9167 

0.9167 

0.9167 

6 

0.5925 

0.8566 

0.8604 

0.8604 

0.8604 

7 

0.5955 

0.7612 

0.8034 

0.7697 

0.8174 

8 

0.6624 

0.8135 

0.8264 

0.8006 

0.8264 

9 

0.6974 

0.6785 

0.7470 

0.7447 

0.7825 

10 

0.5264 

0.6892 

0.7400 

0.7294 

0.7569 


TABLE 4 

Subspace clustering results for different number of persons on the YTC face database. 


5.5 UCSD Traffic Clustering 

The traffic video clips in the database are labeled into 
three classes based on the level of traffic jam. There are 
44 clips of heavy level, 45 clips of medium level and 164 
clips of light level. We regard each video as an image 
set to construct a point on Grassmann manifold, also 
by using the SVD method. The subspace dimension p 
is selected as 20, the cluster number (7 = 3 and the 
total number of samples N = 253. For SSC and LRR, 
we vectorize the former 42 frames of each clip (there 
are 42 to 52 frames in a clip) and then use PCA to 
reduce the dimension (24'^24M2) to 147 by keeping 90% 
variance energy. Note that the level of traffic jam doesn't 
have a sharp borderline. For some confused clips, it is 
difficult to say whether they belong to heavy, medium or 
light level. So it is a great challenging task for clustering 
methods. 

Table presents the clustering performance of all 
the algorithms on the Traffic dataset with two different 
frame sizes. The accuracy of our methods except for 
KGLRR-proj are at least 10 percent higher than the other 
methods. When the frame size is 48 * 48, the KGLRR- 


cc+proj gets the highest accuracy 0.8972 which almost 
reaches the accuracy of some supervised learning based 
classification algorithms ^7\ . However, constrained by 
the CPU resource, we cannot report the results from 
GLRR-21, SSC and LRR. 

6 Conclusion and Future Work 

In this paper, we propose a novel LRR model on Grass¬ 
mann manifold by utilizing the embedding mapping 
from the manifold onto the space of symmetric matrices 
to construct a metric in Euclidean space. To treat different 
noises, the proposed GLRR is further extended to two 
models, GLRR-F and GLRR-21, to deal with Gaussian 
noise and non-Gaussian noise with outliers, respectively. 
We derive an equivalent optimization problem which 
has a closed-form solution for GLRR-F. In addition, we 
show that the LRR model on Grassmann manifold can be 
generalized under the kernel framework and two special 
kernel functions on Grassmann manifold are incorpo¬ 
rated into the kernelized GLRR model. The proposed 
models and algorithms are evaluated on several public 
databases against several existing clustering algorithms. 
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Size 

SSC 0 

LRR 

SCGSM 

GLRR-21 

GLRR-E (3^ 
/KGLRR-^j 

KGLRR-cc 

KGLRR-cc+proj 

48*48 

- 

- 

0.6643 

- 

0.6640 

0.8972 

0.8972 

24*24 

0.6522 

0.6838 

0.6087 

0.7747 

0.7905 

0.8261 

0.8221 


TABLE 5 

Subspace clustering results on the Traffic database. 


The experimental results show that the proposed meth¬ 
ods outperform the state-of-the-art methods and behave 
robustly to noises and outliers. This work provides a 
novel idea to construct LRR model for data on manifolds 
and it has demonstrated that incorporating geometrical 
property of manifolds via embedding mapping actually 
facilitate learning on manifold. In the future work, we 
will focus on the exploring the intrinsic property of 
Grassmann manifold to construct LRR on it. 
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